Search CORE

23 research outputs found

The malSource dataset: quantifying complexity and code reuse in malware development

Author: Caballero Juan
Calleja Cortiñas Alejandro
Estévez Tapiador Juan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2019
Field of study

During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyberattacks and has consolidated as a commodity in the underground economy. In this paper, we analyze the evolution of malware from 1975 to date from a software engineering perspective. We analyze the source code of 456 samples from 428 unique families and obtain measures of their size, code quality, and estimates of the development costs (effort, time, and number of people). Our results suggest an exponential increment of nearly one order of magnitude per decade in aspects such as size and estimated effort, with code quality metrics similar to those of benign software. We also study the extent to which code reuse is present in our dataset. We detect a significant number of code clones across malware families and report which features and functionalities are more commonly shared. Overall, our results support claims about the increasing complexity of malware and its production progressively becoming an industry.This work was supported in part by the Spanish Government through MINECO grants SMOG-DEV (TIN2016-79095-C2-2-R) and DEDETIS (TIN2015-7013-R), and in part by the Regional Government of Madrid through grantsCIBERDINE (S2013/ICE-3095) and N-GREENS (S2013/ICE-2731)

Universidad Carlos III de Madrid e-Archivo

Optimization of code caves in malware binaries to evade machine learning detectors

Author: Estévez Tapiador Juan Manuel
Pardo Eduardo G.
Yuste Javier
Publication venue: Elservier
Publication date: 01/05/2022
Field of study

Machine Learning (ML) techniques, especially Artificial Neural Networks, have been widely adopted as a tool for malware detection due to their high accuracy when classifying programs as benign or malicious. However, these techniques are vulnerable to Adversarial Examples (AEs), i.e., carefully crafted samples designed by an attacker to be misclassified by the target model. In this work, we propose a general method to produce AEs from existing malware, which is useful to increase the robustness of ML-based models. Our method dynamically introduces unused blocks (caves) in malware binaries, preserving their original functionality. Then, by using optimization techniques based on Genetic Algorithms, we determine the most adequate content to place in such code caves to achieve misclassification. We evaluate our model in a black-box setting with a well-known state-of-the-art architecture (MalConv), resulting in a successful evasion rate of 97.99 % from the 2k tested malware samples. Additionally, we successfully test the transferability of our proposal to commercial AV engines available at VirusTotal, showing a reduction in the detection rate for the crafted AEs. Finally, the obtained AEs are used to retrain the ML-based malware detector previously evaluated, showing an improve on its robustness.This research was supported by the Ministerio de Ciencia, Innovación y Universidades (Grant Refs. PGC2018-095322-B-C22 and PID2019-111429RB-C21), by the Region of Madrid grant CYNAMON- CM (P2018/TCS-4566), co-financed by European Structural Funds ESF and FEDER, and the Excellence Program EPUC3M17

Universidad Carlos III de Madrid e-Archivo

Human identification using compressed ECG signals

Author: Cámara Núñez María Carmen
Estévez Tapiador Juan Manuel
Peris López Pedro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

As a result of the increased demand for improved life styles and the increment of senior citizens over the age of 65, new home care services are demanded. Simultaneously, the medical sector is increasingly becoming the new target of cybercriminals due the potential value of users' medical information. The use of biometrics seems an effective tool as a deterrent for many of such attacks. In this paper, we propose the use of electrocardiograms (ECGs) for the identification of individuals. For instance, for a telecare service, a user could be authenticated using the information extracted from her ECG signal. The majority of ECG-based biometrics systems extract information (fiducial features) from the characteristics points of an ECG wave. In this article, we propose the use of non-fiducial features via the Hadamard Transform (HT). We show how the use of highly compressed signals (only 24 coefficients of HT) is enough to unequivocally identify individuals with a high performance (classification accuracy of 0.97 and with identification system errors in the order of 10(-2)).This work was supported by the MINECO grant TIN2013-46469-R (SPINY: Security and Privacy in the Internet of You) and the CAM grant S2013/ICE-3095 (CIBERDINE: Cybersecurity, Data, and Risks)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Security and privacy issues in implantable medical devices: A comprehensive survey

Author: Cámara Núñez María Carmen
Estévez Tapiador Juan Manuel
Peris López Pedro
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Bioengineering is a field in expansion. New technologies are appearing to provide a more efficient treatment of diseases or human deficiencies. Implantable Medical Devices (IMDs) constitute one example, these being devices with more computing, decision making and communication capabilities. Several research works in the computer security field have identified serious security and privacy risks in IMDs that could compromise the implant and even the health of the patient who carries it. This article surveys the main security goals for the next generation of IMDs and analyzes the most relevant protection mechanisms proposed so far. On the one hand, the security proposals must have into consideration the inherent constraints of these small and implanted devices: energy, storage and computing power. On the other hand, proposed solutions must achieve an adequate balance between the safety of the patient and the security level offered, with the battery lifetime being another critical parameter in the design phase. (C) 2015 Elsevier Inc. All rights reserved.This work was partially supported by the MINECO Grant TIN2013-46469-R (SPINY: Security and Privacy in the Internet of You)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Bootstrapping security policies for wearable Apps using attributed structural graphs

Author: Estévez Tapiador Juan Manuel
González-Tablas Ferreres Ana Isabel
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

We address the problem of bootstrapping security and privacy policies for newly-deployed apps in wireless body area networks (WBAN) composed of smartphones, sensors and other wearable devices. We introduce a framework to model such a WBAN as an undirected graph whose vertices correspond to devices, apps and app resources, while edges model structural relationships among them. This graph is then augmented with attributes capturing the features of each entity together with user-defined tags. We then adapt available graph-based similarity metrics to find the closest app to a new one to be deployed, with the aim of reusing, and possibly adapting, its security policy. We illustrate our approach through a detailed smartphone ecosystem case study. Our results suggest that the scheme can provide users with a reasonably good policy that is consistent with the user’s security preferences implicitly captured by policies already in place.MINECO grant TIN2013-46469-R (SPINY: Security and Privacy in the Internet of You)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Universidad Carlos III de Madrid e-Archivo

A survey of wearable biometric recognition systems

Author: Blasco Alís Jorge
Chen Thomas M.
Estévez Tapiador Juan Manuel
Peris López Pedro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2016
Field of study

The growing popularity of wearable devices is leading to new ways to interact with the environment, with other smart devices, and with other people. Wearables equipped with an array of sensors are able to capture the owner's physiological and behavioural traits, thus are well suited for biometric authentication to control other devices or access digital services. However, wearable biometrics have substantial differences from traditional biometrics for computer systems, such as fingerprints, eye features, or voice. In this article, we discuss these differences and analyse how researchers are approaching the wearable biometrics field. We review and provide a categorization of wearable sensors useful for capturing biometric signals. We analyse the computational cost of the different signal processing techniques, an important practical factor in constrained devices such as wearables. Finally, we review and classify the most recent proposals in the field of wearable biometrics in terms of the structure of the biometric system proposed, their experimental setup, and their results. We also present a critique of experimental issues such as evaluation and feasibility aspects, and offer some final thoughts on research directions that need attention in future work.This work was partially supported by the MINECO grant TIN2013-46469-R (SPINY) and the CAM Grant S2013/ICE-3095 (CIBERDINE

Universidad Carlos III de Madrid e-Archivo

Heartbeats Do Not Make Good Pseudo-Random Number Generators: An Analysis of the Randomness of Inter-Pulse Intervals

Author: Estévez Tapiador Juan Manuel
Ortiz Martín Lara
Peris López Pedro
Picazo Sánchez Pablo
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

The proliferation of wearable and implantable medical devices has given rise to an interest in developing security schemes suitable for these systems and the environment in which they operate. One area that has received much attention lately is the use of (human) biological signals as the basis for biometric authentication, identification and the generation of cryptographic keys. The heart signal (e.g., as recorded in an electrocardiogram) has been used by several researchers in the last few years. Specifically, the so-called Inter-Pulse Intervals (IPIs), which is the time between two consecutive heartbeats, have been repeatedly pointed out as a potentially good source of entropy and are at the core of various recent authentication protocols. In this work, we report the results of a large-scale statistical study to determine whether such an assumption is (or not) upheld. For this, we have analyzed 19 public datasets of heart signals from the Physionet repository, spanning electrocardiograms from 1353 subjects sampled at different frequencies and with lengths that vary between a few minutes and several hours. We believe this is the largest dataset on this topic analyzed in the literature. We have then applied a standard battery of randomness tests to the extracted IPIs. Under the algorithms described in this paper and after analyzing these 19 public ECG datasets, our results raise doubts about the use of IPI values as a good source of randomness for cryptographic purposes. This has repercussions both in the security of some of the protocols proposed up to now and also in the design of future IPI-based schemes.This work was supported by the MINECO Grant TIN2013-46469-R (SPINY: Security and Privacy in the Internet of You); by the CAMGrant S2013/ICE-3095 (CIBERDINE: Cybersecurity, Data and Risks); and by the MINECO Grant TIN2016-79095-C2-2-R (SMOG-DEV: Security Mechanisms for fog computing: advanced security for Devices). This research has been supported by the Swedish Research Council (Vetenskapsrådet) under Grant No. 2015-04154 (PolUser: Rich User-Controlled Privacy Policies)

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Chalmers Research

Universidad Carlos III de Madrid e-Archivo

An analysis of fake social media engagement services

Author: Estévez Tapiador Juan Manuel
Nevado Catalán David
Pastrana Portillo Sergio
Vallina-Rodriguez Narseo
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

Fake engagement services allow users of online social media and other web platforms to illegitimately increase their online reach and boost their perceived popularity. Driven by socio-economic and even political motivations, the demand for fake engagement services has increased in the last years, which has incentivized the rise of a vast underground market and support infrastructure. Prior research in this area has been limited to the study of the infrastructure used to provide these services (e.g., botnets) and to the development of algorithms to detect and remove fake activity in online targeted platforms. Yet, the platforms in which these services are sold (known as panels) and the underground markets offering these services have not received much research attention. To fill this knowledge gap, this paper studies Social Media Management (SMM) panels, i.e., reselling platforms¿often found in underground forums¿in which a large variety of fake engagement services are offered. By daily crawling 86 representative SMM panels for 4 months, we harvest a dataset with 2.8 M forum entries grouped into 61k different services. This dataset allows us to build a detailed catalog of the services for sale, the platforms they target, and to derive new insights on fake social engagement services and its market. We then perform an economic analysis of fake engagement services and their trading activities by automatically analyzing 7k threads in underground forums. Our analysis reveals a broad range of offered services and levels of customization, where buyers can acquire fake engagement services by selecting features such as the quality of the service, the speed of delivery, the country of origin, and even personal attributes of the fake account (e.g., gender). The price analysis also yields interesting empirical results, showing significant disparities between prices of the same product across different markets. These observations suggest that the market is still undeveloped and sellers do not know the real market value of the services that they offer, leading them to underprice or overprice their services.This work was supported by the EU Horizon 2020 Research and Innovation Program under Grant agreement no. 101021377 (TRUST aWARE ); the Spanish grants ODIO (PID2019-111429RB-C21 and PID2019-111429RB-C22), and the Region of Madrid grant CYNAMON-CM (P2018/TCS-4566), co-financed by European Structural Funds ESF and FEDER

Universidad Carlos III de Madrid e-Archivo

DENDROID: A text mining approach to analyzing and classifying code structures in Android malware families

Author: Blasco Alís Jorge
Estévez Tapiador Juan Manuel
Peris López Pedro
Suárez-Tangil Guillermo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

The rapid proliferation of smartphones over the last few years has come hand in hand with and impressive growth in the number and sophistication of malicious apps targetting smartphone users. The availability of reuse-oriented development methodologies and automated malware production tools makes exceedingly easy to produce new specimens. As a result, market operators and malware analysts are increasingly overwhelmed by the amount of newly discovered samples that must be analyzed. This situation has stimulated research in intelligent instruments to automate parts of the malware analysis process. In this paper, we introduce DENDROID, a system based on text mining and information retrieval techniques for this task. Our approach is motivated by a statistical analysis of the code structures found in a dataset of ANDROID OS malware families, which reveals some parallelisms with classical problems in those domains. We then adapt the standard Vector Space Model and reformulate the modelling process followed in text mining applications. This enables us to measure similarity between malware samples, which is then used to automatically classify them into families. We also investigate the application of hierarchical clustering over the feature vectors obtained for each malware family. The resulting dendo-grams resemble the so-called phylogenetic trees for biological species, allowing us to conjecture about evolutionary relationships among families. Our experimental results suggest that the approach is remarkably accurate and deals efficiently with large databases of malware instances.Publicad

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

King's Research Portal

Randomized Anagram Revisited

Author: Estévez Tapiador Juan Manuel
Orfila Díaz-Pabon Agustín
Pastrana Portillo Sergio
Peris-López Pedro
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

When compared to signature-based Intrusion Detection Systems (IDS), anomaly detectors present the potential advantage of detecting previously unseen attacks, which makes them an attractive solution against zero-day exploits and other attacks for which a signature is unavailable. Most anomaly detectors rely on machine learning algorithms to derive a model of normality that is later used to detect suspicious events. Such algorithms, however, are generally susceptible to evasion by means of carefully constructed attacks that are not recognized as anomalous. Different strategies to thwart evasion have been proposed over the last years, including the use of randomization to make somewhat uncertain how each packet will be processed. In this paper we analyze the strength of the randomization strategy suggested for Anagram, a well-known anomaly detector based on n-gram models. We show that an adversary who can interact with the system for a short period of time with inputs of his choosing will be able to recover the secret mask used to process packets. We describe and discuss an efficient algorithm to do this and report our experiences with a prototype implementation. Furthermore, we show that the specific form of randomization suggested for Anagram is a double-edged sword, as knowledge of the mask makes evasion easier than in the non-randomized case. We finally discuss a simple countermeasure to prevent our attacks.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo